Search CORE

25 research outputs found

Exploiting Data Skew for Improved Query Performance

Author: Ross Kenneth A.
Zhang Wangda
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/10/2019
Field of study

Analytic queries enable sophisticated large-scale data analysis within many commercial, scientific and medical domains today. Data skew is a ubiquitous feature of these real-world domains. In a retail database, some products are typically much more popular than others. In a text database, word frequencies follow a Zipf distribution with a small number of very common words, and a long tail of infrequent words. In a geographic database, some regions have much higher populations (and data measurements) than others. Current systems do not make the most of caches for exploiting skew. In particular, a whole cache line may remain cache resident even though only a small part of the cache line corresponds to a popular data item. In this paper, we propose a novel index structure for repositioning data items to concentrate popular items into the same cache lines. The net result is better spatial locality, and better utilization of limited cache resources. We develop a theoretical model for analyzing the cache behavior, and implement database operators that are efficient in the presence of skew. Our experiments on real and synthetic data show that exploiting skew can significantly improve in-memory query performance. In some cases, our techniques can speed up queries by over an order of magnitude

arXiv.org e-Print Archive

Crossref

Conditionally Risk-Averse Contextual Bandits

Author: Farsang Mónika
Mineiro Paul
Zhang Wangda
Publication venue
Publication date: 08/07/2023
Field of study

Contextual bandits with average-case statistical guarantees are inadequate in risk-averse situations because they might trade off degraded worst-case behaviour for better average performance. Designing a risk-averse contextual bandit is challenging because exploration is necessary but risk-aversion is sensitive to the entire distribution of rewards; nonetheless we exhibit the first risk-averse contextual bandit algorithm with an online regret guarantee. We conduct experiments from diverse scenarios where worst-case outcomes should be avoided, from dynamic pricing, inventory management, and self-tuning software; including a production exascale data processing system

arXiv.org e-Print Archive

Recommended from our members

Optimizing Query Processing Under Skew

Author: Zhang Wangda
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2020
Field of study

Big data systems such as relational databases, data science platforms, and scientific workflows all process queries over large and complex datasets. Skew is common in these real-world datasets and workloads. Different types of skew can have different impacts on the performance of query processing. Although skew sometimes causes load imbalance in a parallel execution environment, negatively impacting query performance, we demonstrate in this thesis that, in many cases we can actually improve the query performance in the presence of skew. To optimize query processing under skew, we develop a set of techniques to exploit the positive effects of skew and to avoid the negative effects. In order to exploit skew, we propose techniques including: (a) intentionally creating skew and clustering data in a distributed database system; (b) optimizing data layout for better caching in main-memory databases; and (c) adaptive execution techniques that are responsive to the underlying data in the context of compilers. In order to ameliorate skew, we study optimized hash-based partitioning that alleviate outliers in a genomic data context, as well as parallel prefix sum algorithms that used to develop skew-insensitive algorithms. We evaluate the effectiveness of our techniques over synthetic data, standard benchmarks, as well as empirical datasets, and show that the performance of query processing under skew can be greatly improved. Overall this thesis has made a concrete contribution to skew-related query processing

Columbia University Academic Commons

Evaluating multi-way joins over discounted hitting time

Author: Zhang Wangda
张望达
Publication venue: 'The University of Hong Kong Libraries'
Publication date: 01/01/2013
Field of study

The prevalence of graphs in emerging applications has recently raised a lot of research interests. To acquire interesting information hidden in large graphs, tasks including link prediction, collaborative recommendation, and reputation ranking, all make use of proximities between graph nodes. The discounted hitting time (DHT), which is a random-walk similarity measure for graph node pairs, has shown to be useful in various applications. In this thesis, we examine a novel query, called the multi-way join (or n-way join), over DHT scores. Given a graph and n sets of nodes, the n-way join retrieves a ranked list of n-tuples with the k highest scores, according to some aggregation function of DHT values. By extracting such top-k results, this query enables the analysis and prediction of various complex relationships among n sets of nodes on a large graph. Since an n-way join is expensive to evaluate, we develop the Partial Join algorithm (or PJ). This solution decomposes an n-way join into a number of top-m 2-way joins, and combines their results to construct the answer of the n-way join. Since the process of PJ may necessitate the computation of top-(m + 1) 2-way joins, we study an incremental solution, which saves the trouble of recomputation and allows the results of top-(m+1) 2-way join to be derived quickly from the top-m 2-way join results earlier computed. For better performance, we further examine efficient processing algorithms and pruning techniques for 2-way joins. Through extensive experiments on three real graph datasets, we show that the proposed PJ algorithm accurately evaluates n-way joins, and is four orders of magnitude faster than basic solutions.published_or_final_versionComputer ScienceMasterMaster of Philosoph

HKU Scholars Hub

Evaluating Multi-Way Joins over Discounted Hitting Time

Author: Ben Kao
Reynold Cheng
Wangda Zhang
Publication venue
Publication date: 01/01/2014
Field of study

Abstract—The discounted hitting time (DHT), which is a random-walk similarity measure for graph node pairs, is useful in various applications, including link prediction, collaborative recommendation, and reputation ranking. We examine a novel query, called the multi-way join (or n-way join), on DHT scores. Given a graph and n sets of nodes, the n-way join retrieves a set of n-tuples with the k highest scores, according to some aggregation function of DHT values. This query enables analysis and prediction of complex relationship among n sets of nodes. Since an n-way join is expensive to compute, we develop the Partial Join algorithm (or PJ). This solution decomposes an n-way join into a number of top-m 2-way joins, and combines their results to construct the answer of the n-way join. Since PJ may necessitate the computation of top-(m + 1) 2-way joins, we study an incremental solution, which allows the top-(m + 1) 2-way join to be derived quickly from the top-m 2-way join results earlier computed. We further examine fast processing and pruning algorithms for 2-way joins. An extensive evaluation on three real datasets shows that PJ accurately evaluates n-way joins, and is four orders of magnitude faster than basic solutions. I

CiteSeerX

Crossref

HKU Scholars Hub

Supplementary Online Material

Author: AIP Admin (17302936)
Bo Wang Quanzhi Zhang, Yonghai Guo, Wangda Li, Bo Zhang and Jiangwei Cao (17811692)
Publication venue
Publication date: 09/02/2024
Field of study

NiO thickness dependence of the exchange bias field,The derivation the NiO thickness dependence of SMR, MR measurement for control sample, and harmonic Hall measurement

FigShare

Realization of Multi‐Level State and Artificial Synapses Function in Stacked (Ta/CoFeB/MgO)N Structures

Author: Bo Wang
Bo Zhang
Jiangwei Cao
Keliu Luo
Wangda Li
Wenbo Lv
Yonghai Guo
Publication venue: Wiley-VCH
Publication date: 01/02/2023
Field of study

Abstract Spintronic devices can realize multi‐state storage and be used to simulate artificial synapses or artificial neurons, which makes them have promising application prospect in the field of artificial neural networks (ANN). This work investigates the current‐induced magnetization reversal in stacked (Ta/CoFeB/MgO)N structures and their application in ANN. It is demonstrated that the complete current‐induced magnetization reversal with large intermediate transition region can be achieved in the sample with N = 2. The magneto‐optical Kerr microscope imaging shows that the large transition region for the sample is ascribed to the “layer‐by‐layer” reversal, owing to the difference of the coercivity of two CoFeB layers. In addition, the simulation of artificial synapses and artificial neurons function based on current‐induced magnetization reversal in the sample is also demonstrated. These results substantiate the stacked (Ta/CoFeB/MgO)N structures as a promising platform for realizing the multi‐level state and artificial synapses function, and its potential application in the field of ANN

Directory of Open Access Journals

Tunable Wettability of Biodegradable Multilayer Sandwich-Structured Electrospun Nanofibrous Membranes

Author: A. K. M. Mashud Alam
Chunhui Xiang
Elena Ewaldz
Gea-Jodar
Wangda Qu
Xianglan Bai
Zhang
Publication venue: 'MDPI AG'
Publication date
Field of study

Crossref

Predictive deployment of UAV base stations in wireless networks:machine learning meets contract theory

Author: Bennis M. (Mehdi)
Debbah M. (Mérouane)
Lu X. (Xing)
Saad W. (Walid)
Zhang Q. (Qianqian)
Zuo W. (Wangda)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Abstract In this paper, a novel framework is proposed to enable a predictive deployment of unmanned aerial vehicles (UAVs) as temporary base stations (BSs) to complement ground cellular systems in face of downlink traffic overload. First, a novel learning approach, based on the weighted expectation maximization (WEM) algorithm, is proposed to estimate the user distribution and the downlink traffic demand. Next, to guarantee a truthful information exchange between the BS and UAVs, using the framework of contract theory, an offload contract is developed, and the sufficient and necessary conditions for having a feasible contract are analytically derived. Subsequently, an optimization problem is formulated to deploy an optimal UAV onto the hotspot area in a way that the utility of the overloaded BS is maximized. Simulation results show that the proposed WEM approach yields a prediction error of around 10%. Compared with the expectation maximization and k-mean approaches, the WEM method shows a significant advantage on the prediction accuracy, as the traffic load in the cellular system becomes spatially uneven. Furthermore, compared with two event-driven deployment schemes based on the closest-distance and maximal-energy metrics, the proposed predictive approach enables UAV operators to provide efficient communication service for hotspot users in terms of the downlink capacity, energy consumption and service delay. Simulation results also show that the proposed method significantly improves the revenues of both the BS and UAV networks, compared with two baseline schemes

University of Oulu Repository - Jultika